{
 "cells": [
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": "Data Science Assistant 介绍请参考：https://github.com/modelscope/modelscope-agent/blob/master/docs/source/agents/data_science_assistant.md",
   "id": "6806543b7c528e4"
  },
  {
   "cell_type": "markdown",
   "id": "45d56c67-7439-4264-912a-c0b4895cac63",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-09-04T14:17:41.716630Z",
     "iopub.status.busy": "2023-09-04T14:17:41.716258Z",
     "iopub.status.idle": "2023-09-04T14:17:42.097933Z",
     "shell.execute_reply": "2023-09-04T14:17:42.097255Z",
     "shell.execute_reply.started": "2023-09-04T14:17:41.716610Z"
    }
   },
   "source": [
    "### clone代码"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3851d799-7162-4e73-acab-3c13cb1e43bd",
   "metadata": {
    "ExecutionIndicator": {
     "show": true
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "!git clone https://github.com/modelscope/modelscope-agent.git"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f71e64d0-f967-4244-98ba-4e5bc4530883",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2023-09-04T14:17:41.716630Z",
     "iopub.status.busy": "2023-09-04T14:17:41.716258Z",
     "iopub.status.idle": "2023-09-04T14:17:42.097933Z",
     "shell.execute_reply": "2023-09-04T14:17:42.097255Z",
     "shell.execute_reply.started": "2023-09-04T14:17:41.716610Z"
    }
   },
   "source": [
    "### 安装特定依赖"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "489900d6-cc33-4ada-b2be-7e3a139cf6ed",
   "metadata": {},
   "outputs": [],
   "source": [
    "! cd modelscope-agent && pip install -r requirements.txt"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9e9f3150",
   "metadata": {},
   "source": [
    "### 本地配置"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a027a6e8",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "os.chdir('modelscope-agent/examples/agents')\n",
    "\n",
    "import sys\n",
    "sys.path.append('../../')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3de23896",
   "metadata": {},
   "source": [
    "### API_KEY管理"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "65e5dcc8",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "print('请输入DASHSCOPE_API_KEY')\n",
    "os.environ['DASHSCOPE_API_KEY'] = input()\n",
    "print('请输入ModelScope Token')\n",
    "os.environ['MODELSCOPE_API_TOKEN'] = input()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8c8defa3",
   "metadata": {},
   "source": "### 构建DataScienceAssistant"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "01e90564",
   "metadata": {},
   "outputs": [],
   "source": [
    "from modelscope_agent.agents.data_science_assistant import DataScienceAssistant\n",
    "from modelscope_agent.tools.metagpt_tools.tool_recommend import TypeMatchToolRecommender\n",
    "llm_config = {\n",
    "    'model': 'qwen2-72b-instruct', \n",
    "    'model_server': 'dashscope',\n",
    "}\n",
    "data_science_assistant = DataScienceAssistant(llm=llm_config,tool_recommender=TypeMatchToolRecommender(tools=[\"<all>\"]))\n"
   ]
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "### 准备数据集\n",
    "Data Science Assistant在执行时，会根据当前使用的python解释器的工作目录来寻找数据集，所以需要将数据集放在正确的位置。如需查看当前工作目录，可以使用以下代码："
   ],
   "id": "c29311637b848243"
  },
  {
   "metadata": {},
   "cell_type": "code",
   "outputs": [],
   "execution_count": null,
   "source": [
    "import sys\n",
    "\n",
    "# 打印当前Python解释器的路径\n",
    "print(\"Python interpreter path:\", sys.executable)"
   ],
   "id": "2465e82452588f4a"
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": "### 运行DataScienceAssistant",
   "id": "ecaf880ce9940581"
  },
  {
   "metadata": {},
   "cell_type": "code",
   "outputs": [],
   "execution_count": null,
   "source": "data_science_assistant.run(\"This is a customers financial dataset. Your goal is to predict the value of transactions for each potential customer. The target column is target. Perform data analysis, data preprocessing, feature engineering, and modeling to predict the target. Report RMSLE on the eval data. Train data path: './dataset/08_santander-value-prediction-challenge/split_train.csv', eval data path: './dataset/08_santander-value-prediction-challenge/split_eval.csv' .\")",
   "id": "fafed4d52f259f79"
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": [
    "### 查看结果\n",
    "可在控制台查看DataScienceAssistant的运行过程，同时可以在 /modelscope-agent/data/ 目录下查看生成的Jupyter文件和运行过程json文件。"
   ],
   "id": "3189a9ed1a6f9a38"
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
